On Bayesian models and event spaces in information retrieval

نویسنده

  • Stephen Robertson
چکیده

There have been several attempts recently to reconcile, or at least to understand the relationship between, traditional probabilistic models of information retrieval and the newer language models. Since both treat the retrieval problem probabilistically, it might be expected that they can be formulated in comparable terms. However, this has proved diÆcult. One question concerns the role of relevance, which takes a central position in some traditional models (such as Robertson and Sparck Jones [1976], referred to as RSJ), but does not appear explicitly in at least the early language models (e.g. Ponte and Croft [1998]. The present author and others [Sparck Jones et al. 2002] have recently claimed that the early language models assume that there is only one relevant document per query. This claim is based on the observation that language models ask the question of each document: What is the probability that this document, or rather the model which generated this document, also generated the query? Since each document is taken to have its own language model, if it turns out that a particular document is relevant (that is, its model did indeed generate the query), it would seem that no other model could have done. La erty and Zhai [2002], on the other hand, in a recent paper, develop a basic probabilistic model from which they derive both the RSJ model and the simple language model. They claim in conclusion that (a) RSJ and the simple language model are equivalent; and (b) that the language model requires no such assumption as that there is only one relevant document per query. The present paper discusses an issue underlying all probabilistic models, that of the event space assumed, and draws in part from a pair of old papers [Robertson et al. 1982; Robertson et al. 1983]. I discuss possible views of the event space in case of documents, queries and relevance judgements, and come to some di erent conclusions about the relationship between RSJ and the simple language models. However, in order to illustrate the event space issues, the paper rst introduces a rather di erent example from the IR one, with di erent structural characteristics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals

BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Prior Information and the Determination of Event Spaces in Probabilistic Information Retrieval Models

Amismatchbetweendifferent event spaceshasbeenused toargue against rank equivalence of classic probabilistic models of information retrieval and language models. We question the effectiveness of this strategy and we argue that a convincing solution should be sought in a correct procedure to design adequate priors for probabilistic reasoning. Acknowledging our solution of the event space issue in...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002